About

Introduction

This storyboard presents the results of 12 case studies performed in the book, “Text Mining: An Uncharted Territory for Librarians”.

insert book cover here.

How to Cite:

Virtual RStudio Server

Case_Study Title Virtual_RStudio_Server
1B Clustering of Documents using R link
4C Topic Modeling of Documents using R link
5B Network Text Analysis of Documents using Textnets package of R link
6B Burst Detection of Documents using R link
7B Sentiment Analysis of Documents using R link
9A To Make a Dashboard using R link

Reproduce the analysis in the cloud without having to install any software. The computational environment used by the authors is run using BinderHub. Click the hyperlink to open an interactive virtual RStudio environment for a hands-on practice for the case studies that used R programming language.

Virtual Jupyter Notebook

Case_Study Title Virtual_Jupyter_Notebook
1B Clustering of Documents using R link
4C Topic Modeling of Documents using R link
5B Network Text Analysis of Documents using Textnets package of R link
6B Burst Detection of Documents using R link
7B Sentiment Analysis of Documents using R link
9A To Make a Dashboard using R link

Reproduce the analysis in the cloud without having to install any software. The computational environment used by the authors is run using BinderHub. Click the hyperlink to open an interactive virtual Jupyter Notebook for a hands-on practice for the case studies that used R programming language.

1A

Heatmap showing distances between documents

©2021 Lamba and Madhusdhan - all rights reserved


The heatmap plot shows the distances between the documents.

Clustered Heatmap showing distances between documents

©2021 Lamba and Madhusdhan - all rights reserved


The clustered heatmap plot shows another way to visualize the distances between the documents.

Dendogram showing hierarchical clustering of documents

©2021 Lamba and Madhusdhan - all rights reserved


The dendogram presents the hierarichal clustering of documents using the ward method.

1B

Determine the number of K for clustering using Elbow Method

©2021 Lamba and Madhusdhan - all rights reserved


For clustering in R, elbow method was used to determine the number of clusters.

Visualizing distance matrices

©2021 Lamba and Madhusdhan - all rights reserved


Euclidean distance method was used to determine the distance between the documents.

Agglomerative hierarchical clustering

©2021 Lamba and Madhusdhan - all rights reserved


Hierarchical clustering with dendrograms is another way to visualise the distance between the documents.

Circular Dendogram

©2021 Lamba and Madhusdhan - all rights reserved


Circular dendogram is yet another way to visualise the distance between the documents.

Phylogenic Dendogram

©2021 Lamba and Madhusdhan - all rights reserved


Phylogenic structure is another way of visualizing the same results with different perspective according to your research problem and dataset.

4A

Core topics

Timeline showing the core topics in DESIDOC Journal of Library and Information Technology from 1981 to 2018 (©2019 Springer Nature, all rights reserved – reprinted with permission from Springer Nature, published in Lamba and Madhusudhan (2019))


50 core topics were identified that fitted the corpus of 928 DJLIT research articles wherein only 29 topics were identified as unique.

4B

Core topics

Latent Dirichlet Allocation Topic and Word Result for PQDT Global ETDs during 2014-2018 (©2020 Cadernos BAD, all rights reserved – reprinted under Creative Commons CC BY license, published in Lamba and Madhusudhan (2020) )


The results shows the topics assigned to the corpus of ETDs.

4C

Method 1: Plotting top words using stm package

©2021 Lamba and Madhusdhan - all rights reserved


The figure shows the results for 5 topics using Structural Topic Modeling (STM).

Method 2: Plotting MAP histogram using stm package

©2021 Lamba and Madhusdhan - all rights reserved


The figure shows second way of representing the results from Method 1.

Method 3: Visualizing topic model using ggplot2

©2021 Lamba and Madhusdhan - all rights reserved


The figure shows third way of representing the results from Method 1 and 2.

Method 4: Interactive Visualization

©2021 Lamba and Madhusdhan - all rights reserved


The figure shows fourth way of representing the results from Method 1, 2, and 3.

Understanding topics through top 5 representative documents

©2021 Lamba and Madhusdhan - all rights reserved


The Table presents the result for top five representative ETDs for the modeled topics and are ranked according to their probability.

Topic correlation

©2021 Lamba and Madhusdhan - all rights reserved


The figure shows correlation between the topics using a network graph.

5A

Network Text Analysis of Documents using Bibliometrix in R

Word Co-Occurrence Network (©2021 Lamba and Madhusdhan - all rights reserved)


The figure presents the word co-occurrence network for top 50 words that represent the literature indexed in Web of Science (WoS) database on malaria disease for year 2019.

5B

Network Text Analysis of Documents using Textnets in R

Text Network (©2021 Lamba and Madhusdhan - all rights reserved)


The figure represents the 22 clusters/communities of 238 words (nodes) which were determined from the network text analysis of the data.

6A

Burst Detection of Documents using Sci2

6B

Burst Detection of Documents using R

7A

Percentage comparison for polarities

Polarity Percentage (©2018 Springer Nature, all rights reserved – reprinted with permission from Springer Nature, published in Lamba and Madhusudhan (2018))


The figure represents the percentage comparison between polarities for 20 different productivity facets.

Percentage comparison for subjectivities

Subjectivity Percentage (©2018 Springer Nature, all rights reserved – reprinted with permission from Springer Nature, published in Lamba and Madhusudhan (2018))


The figure represents the percentage comparison between subjectivities for 20 different productivity facets.

7B

Percentage-Based Means

©2021 Lamba and Madhusdhan - all rights reserved

8A

Predictive Modeling of Documents using RapidMiner

Screenshot of evaluation result (©2020 Cadernos BAD, all rights reserved – reprinted under Creative Commons CC BY license, published in Lamba and Madhusudhan (2020))


The figure shows the evaluation results for library science ETDs in the PQDT Global database.